What Makes a Hollywood Movie a Hit or a Flop?
Final Project
Data Science 1 with R (STAT 301-1)
Introduction
As an avid movie lover, I have always been curious about what factors play into making some Hollywood movies critically acclaimed blockbusters while others fade into the background. Beyond solely the opening weekend numbers, I am interested in exploring the interplay between more extensive variables that contribute to a film’s propensity to ultimately be a hit or a flop. Specifically, I think it would be very interesting to focus on the five factors of critic and audience ratings, opening weekend revenue, gross (domestic, foreign, and worldwide), budget and budget recovery, and Oscar wins. I also am curious to see whether time of year/seasons have an impact on a movie’s success, and if there is a particular season in which the most successful movies are released. By focusing on these main variables for my analysis, I hope to explore my research question by discovering patterns and compelling correlations between the variables on a range of univariate to multivariate levels. I am interested in exploring whether certain variables affect another and how certain variables work together to contribute to a movie’s overall success rate. In order to carry out this analysis, I will be utilizing a data set found on the Kaggle website called “Hollywood Hits and Flops (2007 - 2023)”, described in the next section.
Data Overview and Quality
text
there were many variables not conducive to perform an analysis on, such as being character type vars and the oscar not being a bool
Explorations: What variables contribute to a movie’s overall success rate?
Variable 1: Ratings
Within this dataset, the 3 main movie rating measures are the Rotten Tomatoes score (audience and critic), Metacritic score (audience and critic), and IMDb rating.
Figure 1 visualizes how the average of Rotten Tomatoes and Metacritic scores have changed over the years, separated by audience and critic rating groups. Overall, it appears that these ratings as a whole have slightly increased since 2007. There are many different factors that could have led to this gradual increase, perhaps that the quality of movies has improved over time, or that reviewers have become more lenient in their scores. Additionally, the audience rating group seems to consistently give higher ratings than the critic rating group, possibly suggesting that they are less harsh when it comes to reviewing films. This analysis of ratings over the years serves to help us understand how much the behaviors of the two rating groups of audience and critics differ, as well as visualizing the overall pattern of critic ratings over the years.
Figure 2 makes use of two measures of movie success that are determined solely by movie critics: Oscar wins and average critic movie ratings. As can be seen in the box plot, Hollywood movies that have won at least one Oscar award have a higher average of Rotten Tomatoes and Metacritic critic ratings than those who have not won any Oscars. This correlations suggests a similar pattern between critics’ movie assessment and award recognition, in that movies who are praised enough by critics to win an Oscar are also favored highly among Rotten Tomatoes and Metacritic critics.
A movie’s Rotten Tomatoes critic rating is typically released before the movie hits theaters. Thus, I was interested in exploring the extent to which the success of this rating has on influencing the success of the movie’s opening weekend revenue. Figure 3 shows, however, that the correlation between these two variables is not very strong. There is a very slight positive association, suggesting that to some extent, as a movie’s Rotten Tomatoes critic rating increases, so does its opening weekend earnings. But, as this association is very weak, this means that the Rotten Tomatoes critic score does not have a drastic/direct impact on opening weekend revenue.
Figure 4 visualizes the average IMDb, Metacritic, and Rotten Tomatoes critic ratings for each of the unique script type combinations of Hollywood movies from 2007-2022. One chief idea to note is that the average IMDb rating is only available for 5 out of the 16 script types, revealing a great amount of missingness within this variable and making it difficult to reach a conclusion about the relationship between script type and average IMDb rating. For the other two rating variables, the script type with both the highest Metacritic and Rotten Tomatoes critic ratings is “documentary”, suggesting that this script type is more favorable among critics than other script types.
In this data set, “audience vs. critics deviance” refers to the difference in a movie’s average of Rotten Tomatoes and Metacritics critic ratings and its average of Rotten Tomatoes and Metacritics audience ratings. A negative deviance means there was a higher audience rating than critic rating, and vice-versa. With a majority of the bars on the plots in the negative axis, this means that for a majority of genres, audience rating groups gave out higher ratings than critic rating groups Faceting by genre, Figure 5 shows that the genre with the lowest absolute value deviance rating is the “biography” category. This suggests that audience and critic rating groups rated movies of this genre most similarly. On the other hand, the genre with the highest disparity in ratings is the “sci-fi” category, suggesting that audiences and critics differ in rating behaviors the most for this genre.
In Figure 6, the deviances in Rotten Tomatoes audience ratings and Metacritic audience ratings is explored by genre. A positive deviance means there was a higher average Rotten Tomatoes audience score than Metacritic score, and vice-versa. With all of the bars in this graph in the positive axis except for the fantasy genre, this suggests that Rotten Tomatoes audience users gave out higher ratings for movies than the Metacritic audience rating group in all genres except fantasy. Moreover, Figure 6 interestingly displays the exact opposite findings of the lowest and highest deviances that were conluded in Figure 5. This time, the genre with the lowest absolute value deviance rating is the “sci-fi” category, suggesting that Rotten Tomatoes and Metacritic audiences rated movies in this category most similarly, and the genre that the Rotten Tomatoes and Metacritic audiences differed in ratings the highest in is the “biography” category. With this repetition in extreme deviances for the “sci-fi” and “genre” categories from both Figure 6 and Figure 5, this could suggest that these two categories are the two genres that are the most varied in opinion.
Figure 7 seeks to explore the relationship between both critic groups’ ratings and domestic gross, as well as how the genre variable plays into this relationship. Overall, there seems to be a positive correlation between a movie’s rating and its domestic gross earnings. That is, as the rating of a movie increases, its domestic gross revenue also increases. However, this correlation seems to be stronger/steeper for audience rating groups than critic rating groups, suggesting that as audience ratings increases, the domestic gross earnings increases at a higher rate than it would with critic ratings. When looking at this relationship through the lens of the different genres, the “action” genre has the steepest correlation for both plots, but it is again steeper for audience rating. This suggests that for action type movies specifically, as their movie rating increases, the amount of domestic gross revenue earned for this type of category is greater than other movie genres. However, this domestic gross earning is greater for audience ratings as they increase for action movies, as compared to critic ratings of action movies.
Figure 8 displays the movie distributors with the highest averages Rotten Tomatoes and Metacritic ratings, for both audience and critic rating groups. The movie distributing company with the highest average critic rating is “A24”, and the distributor company with the highest average audience rating is “Atlas Distribution Company”. This could suggest that movies released by A24 were favored the most among critics, and movies released by Atlas Distribution Company were favored the most among audiences.
Variable 2: Opening Weekend Revenue
A movie’s opening weekend revenue refers to the total box office earnings that the film earned during its first weekend of release in theaters.
Figure 9 visualizes the change in the mean opening weekend earnings (in millions) for Hollywood movies from 2007-2022. As can be seen by the graph, there are two distinct low points on the graph corresponding to the years 2008 and 2020, and these drops can be explained by the economic state of the country during those years. In 2008, the country experienced a Great Recession of economic downturn, greatly impacting the film industry. This economic crisis led to a dramatic decline in consumer spending and movie production, possibly leading to the drop in mean opening weekend earnings that we see in the graph for this year. In 2020, we see a significantly more drastic drop in mean opening weekend revenue, as the COVID-19 pandemic led to a nationwide shut down/capacity limit of movie theaters. With these conditions, there was a dramatic decline in movie theater ticket sales and thus a dramtic drop in the mean opening weekend revenue of movies released during the pandemic, as shown in the graph. These findings are certainly something to keep in mind throughout this variable analysis, as the opening weekend revenue is highly impacted by economic crises such as the 2008 Great Recession and the 2020 COVID-19 pandemic.
Figure 10 explores the relationship between a movie’s opening weekend revenue and how much it earns to recover it’s production cost (budget recovery), categorized by script type and genre. Overall, there is a clear strong, positive correlation between opening weekend revenue and budget recovery, suggesting that as the amount of money a movie earns during the first weekend of its release in theaters increases, the the amount of money it will earn to recover its budget will also increase. However, this correlation varies between each specific genre and script type. In examining genre, the ‘adventure’ category has the highest correlation. This may suggest that out of all movie genres, the adventure category earns more of its budget back as their opening weekend increases. In examining script type relationships, the ‘remake’ script type has the highest correlation, also suggesting that remakes earn a higher amount to recover its budget as their openign weekend earnings increase.
Figure 11 displays that the genre combination that earned the greatest average revenue during its opening weekend of release is sci-fi & fantasy, and the script type combination that earned the greatest average revenue during its opening weekend of release is sequel & adaptation. This suggests that the movies categorized as a sci-fi fantasy genre hybrid earned more during the first weekend of their release than other genre combinations, and movies categorized as a sequel adaptation script type hybrid also earned that title.
Figure 12 shows that Hollywood movies that have won at least one Oscar award or greater have an average opening weekend revenue that is actually less than movies that have not won any Oscars. This could suggest that the mean opening weekend success of a movie does not correlate with winning an Oscar, and these two variables are unrelated to one another. In other words, having a high opening weekend revenue may not increase a movie’s chance of winning an Oscar.
Figure 13 displays very strong, positive correlations for both associations of domestic gross by opening weekend revenue and foreign gross by opening weekend revenue. This suggests that a Hollywood movie’s performance during its opening weekend of release has a direct positive association with its overall domestic and foreign grosses. That is, as opening weekend earnings success increases, so will domestic and foreign gross successes. Additionally, the correlation between opening weekend revenue and domestic gross seems to be slightly steeper than the correlation between opening weekend revenue and foreign gross, suggesting that opening weekend revenue performance has a slightly greater impact on its domestic gross performance than it does its foreign gross performance.
Variable 3: Domestic, Foreign, & Worldwide Gross
Figure 14 visualizes the change in the yearly average domestic gross (in millions) for Hollywood movies from 2007-2022. Just as in Figure 9, there are significant drops for the years 2008 and 2020, also due to the economy of the country during those years. With the 2008 Great Recession, declines in consumer spending due to the economic downturn directly impacted the total box office revenue of movies. With the 2020 COVID-19 pandemic, quarantining and the closing of movie theaters also led to declines in consumer spending and a direct decline in gross domestic revenue for movies. Like the opening weekend revenue variable, the domestic gross variable is heavily impacted by economic crises such as the 2008 Great Recession and the 2020 COVID-19 pandemic.
Figure 15 displays a direct and strong positive correlation between the domestic gross earnings and foreign gross earnings of Hollywood movies. In other words, as the domestic gross earnings of a movie increases, its foreign gross earnings also increase. This suggests that US and foreign audiences have similar preferences in movie popularity.
Figure 16 seeks to explore another comparison of movie preference behavior between domestic and foreign audiences, this time by comparing gross performance among movie genres. In determining the most popular genres by highest average gross revenue between the two audiences, the “sci-fi” category has the best domestic performance, while the “action” and “adventure” categories are tied for the best foreign performance. This suggests that there is a difference in movie genre popularity between the two audiences, in that US movie audiences have a high preference for sci-fi category movies, while foreign movie audiences have a high preference for action and adventure movies. A sci-fi movie may perform better in the US than compared to foreign movie theaters, and action and adventure movies may perform better in foreign movie theaters.
As a final comparison of movie preference behavior between domestic and foreign audiences, Figure 17 explores the movie distributors with the top 5 highest average domestic and foreign gross revenues. For both US and foreign audiences, the movie distributor with the most successful gross performance is Walt Disney Studios. This reveals a similarity between domestic and foreign audiences in that movies distributed by Walt Disney Studios are more popular (generate more gross revenue) than movies released by other distributors.
In Figure 18, there is a clear positive relationship between a movie’s worldwide gross earnings and the percent of the its budget that is recovered. This suggests that the greater box office revenue a movie earns, the more of its budget will be able to be earned back following its production/release into theaters.
Variable 4: Budget & Budget Recovery
Figure 19 follows the same patterns as Figure 9 and Figure 14, showing that the variable of movie budget is also highly impacted by economic crises. In this graph, there are also two distinct low points corresponding to the years 2008 and 2020. With the 2008 Great Recession, financial challenges could have resulted in cost-cutting measures and a more stringent approach to budgeting for movie distributors, leading to a lower average movie budget for that year. With the 2020 COVID-19 pandemic and quarantine, film studios may have altered their production strategies of their movies by delaying the start of filmmaking, leading to an overall decline in film production and thus a decline in mean budgets for that year. From these three similar variable findings, there seems to be a common trend that a movie’s success is greatly impacted by the economy.
In Figure 20, there is a clear positive association between a Hollywood movie’s budget and its earnings both during its opening weekend of release and overall earnings worldwide. This suggests that, on average, movies with higher production budgets tend to achieve greater box office revenue success. It can be concluded that movie budget is closely related to the variables of opening weekend revenue and worldwide gross, in that as the budget of movies increases, its opening weekend revenue earnings and worldwide gross revenue earnings also increase.
Figure 21 visualizes the distribution of movie production budgets for each of the genre categories, with the fantasy genre having the highest average budget. This could be due to the fact that the production of fantasy movies usually involves elaborate visual effects, intricate makeup/costumes, computer-generated imagery (CGI), and other advanced technologies to create mythical worlds and landscapes, thus requiring substantial financial investment in technology, skilled artists, and post-production processes that contribute to an overall high average budget.
Similar to Figure 12, Figure 22 shows that Hollywood movies that have won at least one Oscar award have an average production that is actually less than movies with no Oscar wins. This could suggest that having a high production budget does not relate to or increase the chances of a movie winning an Oscar, and that having a high production budget may not be a factor taken into account when voting for Oscars.
Figure 23 seeks to explore how a movie’s production budget is correlated with three rating measures: the average of Rotten Tomatoes and Metacritic critic scores, the average of Rotten Tomatoes and Metacritic audience scores, and IMDb ratings. For all three graphs, there seems to be very weak positive correlations as the data points are very spread out from each other. This could suggest that there is a slight tendency for movies with higher budgets to receive slightly higher ratings, but the relationship is not very strong, and movie budget is not a direct determinant of rating success.
Variable 5: Oscar Wins
Figure 24 displays that the genre combination with the most Oscar wins is “biography, history”, and the script type with the most Oscar wins is “original screenplay”. This suggests that the movies categorized as “biography, history” or “original screenplay” are more successful among Oscar voters.
Figure 25 shows that Hollywood movies that have won at least one Oscar award have an average of worldwide gross earnings that is greater than movies that have not won any Oscars. This could suggest a link between these two variables in that movies that have won an Oscar also have a better worldwide box office revenue performance than movies that have not won any Oscars.
Variable 6: Seasonal Release Date
These analyses seek to explore how the five main variables above vary/are impacted by the season a movie is released in, and what seasonal release date trends may exist in influencing a movie’s success rate.
Figure 26 shows a comparison between the average ratings for each season between the critic and audience rating groups. There appears to be a similar pattern for both rating groups’ seasonal average critic numbers, with the highest ratings given for movies released in the Fall, and the lowest ratings given for movies released in the Winter. This reveals a similarity in the seasonal patterns of movie ratings for the two rating groups. However, the taller bar graphs in the plot on the right depict a disparity between the two groups’ rating patterns in that the audience rating group gives out higher ratings than the critic rating group, as revealed in Figure 1. Figure 26 stands to visualize a way in which the rating patterns for these two groups are similar, and confirm a previous finding of a way that their patterns differ. An overall conclusion can be made that movies released in the Fall have the highest ratings, while movies released in the Winter have the lowest ratings.
In Figure 27, it is clear that movies with the highest average opening weekend revenue were released in the Spring. This could suggest that movies that are released in the Spring are more successful in terms of generating more earnings during their first weekend in theaters than movies released in other seasons.
Figure 28 shows that movies released during the Summer months have the highest average worldwide gross. This could be due to the fact that in many countries around the world, kids are on summer vacation during these months, and thus families are more likely to go to the movies and contribute to increased ticket sales.
Figure 29 shows that movies released in the Spring have the highest average movie budgets. This directly aligns with previous findings in the EDA. In Figure 20, it was concluded that there exists a positive association between a Hollywood movie’s budget and its opening weeked earnings. Therefore, since Figure 27 revealed that the season of movies released with the highest average opening weekend revenue was Spring, then the season of movies released with the highest average movie budgets should also be the Spring, and that is what we see in this plot. This supports our finding of the positive correlation that exists between a movie’s budget and opening weekend revenue.
In Figure 30, movies that were released in the Fall season won significantly more Oscars than movies released in other seasons. This is due to the fact that the Fall season is close to around the time when Oscar voting starts, and thus these films are more salient/relavent among the voters, but there is still enough time away from the start of voting for the films to gain enough popularity and traction before the awards are given out. From this, we can conclude that when defining a film’s success solely defined by the number of Oscar wins, releasing the film during the Fall season will greatly increase its chances of being successful.
Conclusion
text
References
text
Appendix: technical info
text
Appendix: extra explorations
movie with highest -average critic rating -average audience rting -opening weekend -domestic gross -foreign gross -ww gross -budget -budget recovered -oscar wins -imdb rating